per 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bursau 




INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT^ 



(51) Internatiooal Patent Classification 3 : 
C12Q 1/68: C12N 15/00, 1/20 
C12N 1/00 



Al 



(11) Internatiooal Publication Number: 
(43) International Publication Date: 



WO 82/ 02060 I 

1 

24 June 1982(24.06.82) i 



(21) International Application Number: 

(22) International Filing Date: 17 December l98L(n. 12.31) 

(31) Priority Application Number: 217.643 

(32) Priority Date: 18 December 1980 (18.12.80) 

(33) Priority Country: ^'^ 

(71) Applicant: YALE UNIVERSITY [US/USl; Wood- 

bridge Hall, Box 1302A Yale Staaon, New Haven, CT 
06520 (US). 

(72) Inventors: WEISSMAN, Sherman, = J^it St-/°"^ 

Street, New Haven, CT 0651 KUS). PEREIRA. Den- 
nis 12 Woodbine. Westerly. RI 02891 (US). SOOD, 
Ashwani ; 176 Highland Street, New Haven, CT 
06510 (US). 



(81) Desicnated States: AT i European patent), AU, 3E (Eu- > 
ropean patent), 3R, CH (European patent). DE ^Eu- i 
fopean patent), DK, FR (European patent). GB ( Eu- 
ropean patent), JP, LU (European patent). NL tEuro- I 
pean patent), NO. SE (European patent). bU. 



Published , I 

IViih intemaiional search report. j i. I 

Before the expiration of the time limit for amending the , 
claims and to be repubiished in the event of the receipt , 
of amendments. ' 



(54) Title: METHOD FOR CLONING GENES 
(57) Abstract 

A method is provided for isolating and Identifying a recomumant clone having a DNA ^^"Vf 
at leasf one desired'heterologous polypeptide, at * shon ajnmo ^-//"i:^"^^ ^/^^^^^^^^ "sS^e Sant cSna 

man histocompatibility antigens are provided, m pamcular, clones for HLA-B anagens. 



4 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international appli- 
cations under the PCT. 



AT 


Austria 


KP 


Democratic People's Republic of Korea 


AU 


Australia 


U 


Liechtenstein 


BB 


Bnzil 


LU 


Luxembourg 


CF 


Central African Republic 


MC 


Monaco 


CC 


Congo 


MC 


M adagascar 


CH 


Swiueriand 


MW 


M alawi 


CM 


Cameroon 


NL 


Netherlands 


OE 


Gemiany. Federal Republic of 


NO 


Norway 


DK 


Denmarfc 


RO 


Romania 


n 


Finland 


SE 


Sweden 


FB 


France 


SN 


Senegal 


GA 


Gabon 


SL- 


Soviet Union 


GB 


United Kingdom 


TD 


Chad 


HU 


Hungary 


TC 


Togo 


JP 


Japan 


US 


United Slates of Ameri>:a 



wo 82/02060 



-1- 



PCT/USSl/01684 



METHOD FOR CLONING GENES 



BACKGROUND OF THE INVENTION 
The present invention relates to a method for 
isolating and identifying a rf^combinant clone having a 
DNA segment therein coding for a polypeptide at least a 
5 short amino acid sequence of which is- known- The 

present method is illustrated by its application to the 
production of clones containing a plasmid incorporating 
DNA coding for human histocompatibility antigens of the 
HLA-B region, 

IQ Recombinant DNA techniques are now quite well 

known. See, e.g.. Morrow, "Recombinant DNA Techniques" 
in Methods In Enzymology , 68, 3-24 (Academic Press, 1979) 
and references cited therein, all of which are incorpo- 
rated herein by reference. See also UK Patent Applica- 

15 tions GB2,007,675A, 2,007,676A and 2,008,123A, the dis- 
closures of which are also incorporated herein by ref- 
erence. Genes coding for various polypeptides may be 
cloned by incorporating a DNA fragment coding for the 
polypeptide in a recombinant DNA vehicle, e.g., a bac- 

20 terial or viral plasmid, and transforming a suitable 

host, typically an Escherichia coli (E. coli) cell line, 
and isolating clones incorporating the recombinant 
plasmids. Such clones may be grown and used to produce 
the desired polypeptide on a large scale. 

25 DNA for cloning may be obtained in at least three 

ways. DNA extracted from cellular material may be used. 
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exther as such or after excision of various portions. 

T/TtT^'''' ^""'^ synthesized chernically, pro- 

vxded the nucleotide sequence is .nown or the Zlnl 



acid sequence of the desired polypeptide 



10 



25 



30 



35 



. - , is known. 

A third method of producing DNA for cloning is 

ITIZI 'T^^^^^^^^^ catalyzed synthesis in the presence 
Of mRNA. However, Jcnown techniques for isolating and 
identifying cDNA produced fron, PNA by reverse trar scrip- 
txon are inadequate for cloning cDNAs for which the 
corresponding mRNA constitutes a very minor proportion 
Of the total mHNA from a cell type. 

The use of oligonucleotide primers to detect yeast 
cytochrome C mRNA and gastrin mRNA has been reported 
(SzostaJc et al. Nature. 265, 63 (1977); Noyes et al , 

^atl- Acad. Sci. USA , 76, 1770 (1979)). The use 
of dideoxynucleoside triphosphates (ddNTPs) and arabin- 
osyl triphosphates for sequencing DNA (Sanger et al 
Proc. Natl. Acad. Sci. gSA , 74, 5463 (1977)) and for ' 
sequencing RNA (Zimmeren et al, Proc. Natl, a^.h sci 

20 USA, 75, 4257 (1978)) is known. ' — 

The histocompatibility region (HIA) in humans is 
a complex gene family located on chromosome 6. The 
genes m this region are related to immune responses 
and tend, to be l^ighly. polymorphic, reflecting the highly 
xndxvxdual i^une response of humans. The best studied 
and most relevant loci within the HLA region are the 
HIA-A, HLA-B, HLA-C, HLA-D and HLA-DRW loci. See 
Perkins, -The Human Major Histocompatibility Complex 
l"^"^^"' Basic and Clinical Immuno logy. 2nd r.^i^,r.r. 

165-174 (Lange Medical Publ. , 1978) for further details 
regarding this complex. 

Antigens coded for by the HLA loci are primarily 
proteins located on the cell's exterior surface, and 
allow the cells to recognize self from non-self. At a 
cellular level, these antigens are involved in the 
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recognition of host from foreign cells, and enable a 
population of lymphocytes called helpers T cells to be 
activated and participate in the activation of macro- 
phage, B cells and Iciller T cells. In attempts to 
5 graft tissue from a-d.ifferent donor, a mismatch of any 
of these HLA loci results in a graft/host reaction 
which ultimately csm lead to the rejection of mis- 
matched organs by patients. 

The proteins coded for by the HLA region com- 
10 prise only about 0.05 to 0.1% of the total cellular 

proteins and about 1% of the membrane associated protein 
of lymphoblastoid cell lines. The antigens of the 
HLA-A, HLA-B and HLA-C regions tend to co-purify and 
are therefore difficult to isolate- In addition, iso- 
15 lation of these antigens from human cell lines would be 
expensive. Consequently, the separation of individual 
genes and production of specific antigens by conven- 
tional procedures is fraught with difficulty. 

Other genes of biological interest are also dif- 

2 0 ficult to isolate, for many of the reasons that hold 

for HLA antigens. Even where full or partial amino acid 
sequences are available for the corresponding proteins, 
a low proportion of mRNA corresponding to the gene still 
makes isolation and characterization difficult.. ' 

25 A need therefore continues to exist for a method 

for cloning genes for polypeptides whose corresponding 
mRNA is a minor fraction of the total mRNA in an mRNA 
mixture, especially a method sensitive to. mRNA present 
in the mixture below the level of 2 mol%, and even 

30 below 0.01 mol%- 

•= OBJECTS OF THE INVENTION 
One object of the present invention is to provide 
a method for cloning genes, where the mRNA correspond- 
ing to the gene is a very minor constituent of an mRNA 

3 5 mixture . 
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Another object of the invention is to provide a 
means of producing desired polypeptides by expression of 
genes coding for these polypeptides in clones of trans- 
forzned plasmid hosts containing recombinant plasmids in- 
corpora ting DNA segments for the genes. 

A ftarthei^ object of the invention is to provide DNA 
segments, plasmids incorporating the segments and trans- 
formed hosts containing the plasmids, where the DNA seg- 
ments code for desired polypeptides. 

Yet another object of the invention is to provide 
clones containing genes for histocompatibility antigens 

upon further study of the specification and appended 
claxms, further objects and advantages of this invention 
will become apparent to those skilled in the art. 

SUMMARY OF THE INVENTION 
Briefly, these objects and others which will herein- 
after become apparent may be achieved by a method for iso- 
lating and identifying a recombinant clone having a DNA 
segment therein coding for at least one desired heterologous 
0 polypeptide, at least a short amino acid sequence of which 
IS known, said method comprising the steps of: 

(a) effecting cDNA synthesis on a mixture of mRNAs 
containing a target mRNA coding for said at least one poly- 
peptide, and. isolating the resultant cDNA mixture; 
5 (b) inserting said resultant cDNA into recombinant 

cloning vehicles, and transforming hosts with said vehicles- 
and . ■■ ■ . ■ ' 



(c) separating the transf ormants and isolating and 
Identifying a recombinant clone cohtaiiiing a DNA segment 
which is homologous over at least a portion thereof to at 
least one oligonucleotide probe specific for said DNA seg- 
ment; wherein said probe is an extension of the nucleotide 
sequence of an oligonucleotide primer having a nucleotide 
sequence complementary to a region of said target mUNA coding 
for a portion of said known amino acid sequence, and is com- 
plementary to a longer region of said target mR>-A coding for 
a longer portion of said known amino acid sequence. 
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DETAILED DISCUSSION 
A useful method for cloning genes coding for a 
desired polypeptide involves inserting a cDNA segment 
into a vehicle, transforming a host and isolating recom- 
binant clones:— Wher& the mRNA coding for the polypeptide 
is a minor constituent of a diverse mRNA mixture, isola- 
tion and identification of the corresponding cDNA is ex- 
tremely difficult. Reverse transcription typically is 
effected using a nonspecific primer, e.g., oligo CdT) •L2«ig ' 
which results in extension from virtually all mRNAs. Those 
CDNA species from mRNA constituents present to the extent 
of at least about 2 mole% of the total mRNA will be seen 
distinctly above the background, while cDNA from minor 
constituents will not. Until now, isolation and identifica- 
15 tion of such minor cDNA products would have required analy- 
sis of all CDNA produced, a formidable tasJc which had ren- 
dered this method practically unworkable. 

The present invention provides a general method for 
solving, this problem. Not only is sensitivity increased by 
20 several orders of magnitude, but the present method permits 
cloning genes for polypeptides for which neither complete 
amino acid sequences nor complete DNA or mRNA sequences are 
known. It is not even necessary for the polypeptide to be 
pure, so long as a major fragment can be isolated. 
25 The sole requirement is knowledge of a short amino 

acid sequence, e.g., 5-25, preferably at least 15, amino 
acid residues, in the polypeptide of interest. An oligo- 
nucleotide primer is synthesized which is predicted to be 
complementary to- a region of the mRNA coding for a portion 
3 0 of the known amino acid sequence. The primer is used to 

prime reverse transcription on a mixture of mRNAs containing 
the target mRNA. The use of a specific oligonucleotide 
primer gives rise to a relatively small nxamber of cDNA 
products depending on the length of the oligonucleotide. 
35 At least a ten-base sequence is preferred, with an 
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u.de=a„u=leot.de or a dodecanucl.otide bein, particularly 
preferred. omess the >c.ov^ a^ino acid sequence has a 4 
or S peptide sequence coded for by unique mRNA codons , 
of a primer having more than about 14-16 nucleotides is 
5 counterproductive, since there is a higher probability of 
a mxsmatch due to-code degeneracy in a longer primer 
sequence . 

The mixture of extended primer cDNAs is typically 
Ttt ^^^^ oon^lex. According to the present method, a 
10 further reduction in complexity is achieved by the 

use Of various techniques for limiting the length of pri..er 
cftaxn extension products. 

Reverse transcription on the primer may be termi- 
nated When a specific nucleotide is required to pair with 
the mJ!NA by, e.g., using only three deoxynucleoside tri- 
phosphates (dNTPs), or by replacing one of the four dNTPs 
with the corresponding dideoxynucleoside triphosphate 
(ddNTP) or an equivalent variant have a base but lacking 
the proper 3 '-OH group. The ddNTP or other variant will 
be incorporated in the extended primer as the terminal 
nucleotide. 

The extended primer cDNAs are separated and the 
longer primer extension products are sequenced. The 
sequences are analyzed to find those that are complementary 
to a mRNA sequence coding for an extension of the known 
amano acid sequence used to design the primer. These 
longer oligonucleotides are used as probes for homologous - 
full length cDNA. 

The use of. ddNTPs is especially useful to produce 
oligonucleotide probes. Synthesis of cDNA is - effected 
usxng an oligonucleotide primer and the same milNA mixture 
used initially, with one of the four dNTPs replaced by the 
corresponding ddNTP. The ddNTPs comprise dideoxyadenosine 
triphosphate (ddATP) , dideoxycytidine triphosphate (ddCTP) , 
dideoxyguanosine triphosphate (ddGTP) and dideoxy thymidine 



20 
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triphosphate (ddTTP) . One of these ddNTPs and the three 
dNTPs containing the other three bases are supplied to the 
mRNA mixture together with the oligonucleotide primer and 
reverse transcriptase. The reaction is advantageously re- 
peated three more times, using a different ddNTP each time. 
Each run will produce a cDNA mixture containing the ddNTP 
as its terminal nucleotide. 

The resultant cDNA mixture from each run is sepa- 
rately fractionated, e.g., by gel electrophoresis, and 
those individual cDNA bands corresponding to oligonucleo- 
tides having at least four more nucleotides than the oligo- 
nucleotide primer are sequenced and compared to the known 
partial amino acid sequence of the desired polypeptide. 
Chain- terminated probes are selected whose nucleotide 
15 sequences are complementary to an mHNA sequence coding for 
an extended amino acid sequence of the known fragment of 
the desired polypeptide. 

It is advantageous to design a primer complementary 
to an mRNA region coding for amino acids in the polypeptide 
20 which are specified by unique codons. To the extent that 
code degeneracy is avoided, the primer is predicted to be 
that much more likely to be complementary to the actual 
mRNA sequence for the particular amino acid sequence used 
as the predictive model. Where there is degeneracy in the 
25 third nucleotide of a codon, its complementary base in the 
primer should be one which can form a wobble base pair with 
a mismatched mRNA base. Thus, for example, where the third 
nucleotide of a codon can be either A or G, the degenerate 
position in the complementary oligonucleotide primer will be 
30 occupied by T rather than C, since T can form a wobble base 
pair with G as well as a normal base pair with A, while C 
cannot easily form a wobble base pair with A. In addition, 
known preferences for and against certain nucleotide 
sequences in the species whose mRNA is being copied are used 
35 to further refine the selection of an appropriate oligo- 
nucleotide primer sequence. 
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Where cDNA synthesis is effected on the mRNA mixture 



in the presence of the oligonucleotide primer but with only- 
three of the four dNTPS/ primer extension will proceed 
until the messenger RNA code contains the complement of the 
5 missing dNTR^and will stop at that point. Candidates for 
oligonucleotide probes will be selected from the primer 
extension products having at least three more nucleotides 
than the primer. Again, these will be sequenced and those 
oligonucleotides complementary to a mRNA sequence coding for 
10 a longer portion of the known amino acid sequence of the 

polypeptide will be identified/ isolated and used as probes 
to recognize full-length cDNA having a homologous sequence. 



cedures may be used, where they achieve an equivalent re- 
15 suit. For example, the use of one purine or pyrimidine 

deoxyarabinosyl triphosphate in place of the corresponding 
purine or pyrimidine deoxyribosyl triphosphate will be 
equivalent to the use of the corresponding ddNTP. Of 
course, purine or pyrimidine dideoxyarabinosyl triphosphates 
2 0 may also be used in place of purine or pyrimidine dideoxy- 
ribosyl triphosphates analogously to the above procedure . 
In each case, it is advantageous to use four different runs, 
each with a different dNTP either omitted or replaced by a 
variant which will result in chain termination at that 
25 point. This further reduces the heterogeneity and complexity 
of the cDNA mixtures and greatly increases the sensitivity 
of the method. - . 



nucleotide probes is to carry out cDNA synthesis on the raRNA 
30 mixture' in the presence of Ictbeled primer, and then cleave 
the extended primer cDNAs with an appropriate endonuclease . 
This gives longer cDNA extension products and may be used in 
conjunction with chain terminated cDNA synthesis to resolve 
even more complex mixtures of full-length cDNA products. 
35 Any endonuclease capable of cleaving single-stranded DNA 



It will be recognized that variants of these pro- 



A further technique for producing longer oligo- 
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can be used, e.g., aegyptus III, used previously for 
sequence analysis of SV 40 mRNA, by Reddy et al, J. Virol , 
3j0, 279-296 (1979) . Cleavage products of appropriate chain 
length, e.g., 25-400 nucleotides, preferably 100-350 nucleo- 
5 tides, and optimally ^about 200 nucleotides, which retain the 
labeled primer chain, are isolated and sequenced, and probes 
are identified as before. 

The principle of the present method is the same in 
all cases. First, full-length cDNA synthesis is effected 
10 on an mRNA mixture. Then, a short oligonucleotide primer 
sequence is synthesized which is predicted, on the basis 
of a known amino acid sequence in the desired polypeptide, 
to be complementary to a target mRNA for that polypeptide. 
Then, cDNA synthesis is effected on the same mRNA mixture 
IS in the presence of the primer, and optionally in the 

presence of chain terminators. Primer extension products 
are separated and sequenced to find oligonucleotide probes 
whose nucleotide sequence is complementary to a longer 
region of the mRNA coding for an extended amino acid 
2 0 sequence corresponding to the known sequence of the desired 
polypeptide. The full-length cDNA is inserted in a recom- 
binant cloning vehicle which is then used to transform 
appropriate hosts, and the clones are separated and 
analyzed. The probe-.is used to identify clones containing 
25 DNA which is homologous in at least one region with the 
probe sequence. 

Homology between clones amd probes is determined, in 
general, by hybridization of DNA isolated from the clones 
with labeled probe. : The activity of the probe can be 
30 further increased by effecting cDNA synthesis in the 
presence of the primer: using labeled dNTPs, especially 
where a probe has already been identified and may be obtained 
from a specific ddNTP channel. 

A further extension of the method, which can lead 
35 to higher sensitivity and efficiency, involves the use of 
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a previously identified probe as a new primer for subse- 
quent probe synthesis. This will be most useful where the 
full-length cDNA mixtures are highly complex, or where 
only short amino acid sequences of the desired polypeptide 
5 are known, separated by substantial unknown regions. In 
this case, it may be necessary to isolate and sequence 
several probes differing somewhat in their intermediate 
nucleotide sequences before a sequence corresponding to 
another known sequence in the polypeptide is reached and 
10 the correct probe is identified. Further variants of the 
method of the invention will also be apparent, all of which 
are merely permutations of the aforementioned basic princi- 
ple, and therefore included within the scope of this inven- 
tion. 

15 Insertion of the cDNA from full-length reverse tran- 

scription into a recombinant cloning vehicle, e.g., a bac- 
terial or viral plasmid, may be effected by any of the 
various techniques well known in the art. Many plasmids 
are disclosed in Morrow, "Recombinant DNA Techniques", in 

20 Methods In Enzymology , 68 , at pages 5-15 (1979), and 

references cited therein. Preferably, vectors existing 
at high copy nxombers and replicating in a relaxed mode are 
used, since this increases the yield of expressed protein. 
The plasmid pBR3 22 is frequently" used as a recombinant 

25 cloning vehicle. It contains six different types of 

restriction cleavage termini, and its nucleotide sequence 
has been determined by Sutcliffe, Nucleic Acid Res., 5^, 2721 
(1978) (this reference and the references cited therein are 
incorporated herein by reference) . 

30 Plasmids possessing the region of lambda recognized 

. by lambda gene E are called cosmids, and introduction of 
cloned DNA segments into such vectors has the advantage that 
the vectors containing the DNA insert can then be packaged 
in vitro into lambda phage particles. The phage infect and 

35 transform bacterial host such as E. coli in very high effi- 
ciency and yield a very high number of recombinant elopes. 
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In practice, certain recombinant cloning vehicles 
are more useful than others, and the particular vehicle 
employed will be chosen as a function of various known 
factors including, in certain cases, guidelines governing 
recombinant DNA processes. 

The cDNA resulting from reverse transcription is 
inserted into the cloning vehicles by any one of many known 
techniques, depending at least in part on the particular 
vehicle being used. The various insertion methods are dis- 
cussed in considerable detail in the aforementioned Morrow 
reference at pages 16-18, and the references cited therein, 
in general, the vehicle is cleaved by at least one restric- 
tion endonuclease, which will either produce a blunt end 
or a cohesive end. Cohesive ends can be added to blunt- 
15 ended vehicles and/or to the DNA to be introduced. The 

DNA is combined with the vehicle and ligated, if necessary, 
with an appropriate ligase. Advantageously, the vehicle is 
treated with, e.g., nuclease-f ree alkaline phosphatase, to 
prevent self -sealing . 

Once the DNA insert is added, the cloning vehicle is 
used to transform a suitable host. Such host are generally 
bacterial strains, e.g., E. coli K12 mutants such as E. coli 
HBlOl, E. coli RRl, and the like. The cell line LE392 con- 
structed by Enquist is advantageously used as a host for 
lambda phage, as reported by Tiemeier et al. , Nature, 263, 
526 (1976). 

In general, a host strain must be compatible for the 
selection procedure used in selecting the clones and must 
be unable to restrict DNA of "foreign" metbylate. That is, 
when vehicles are used whose selection characteristics rely 
on antibiotic resistance, the host must not already possess 
an intrinsic antibiotic resistance to this drug. In addi- 
tion, the plasmid should be able to replicate in the same 
host. 

For example, a suitable host for the bacterial 
plasmid pBR 322 must be Ampicillin- sensitive and/or 
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Tetracycl.„e-se„sitive, must be of plas.ids of the 

3.^ rep"^"": 

IIZ -chanism, for example, must be polA. 

w.t^ regard to resistance to ,orei,„ methylate, ^T^i 

by three genes, hsdR, hsdM and hsds. The hsdR ,ene is 
thought to encode the restriction endonuci^H; of :he 

specifically methylates DNA synthesized in the bacteria 
.0 Which allows for the detection at the o.. level orielf 
from nonself. The hsdS gene is thought to be a locus 

Which provides the substrate (dna) 

^^"A; binding activitv of a 

multi^ric restriction-modification enzyml compler Cells 

Of Md« genealogy are unable to modify ONA, those of 

~. '^^'^ "° """" '""5" ONA and those of 

^ genotype are unable either to restrict or to modify 

the-K-ir:::L"'%""'"'"^" ^^---"^ 

^ne K 12 modification pattern ^-fi-K^^ 

. paT:-cern, either because they are 

synthesized in vitro or because they are isolated from 
non K-12 origin, a host must generally either be hsdR- 
nlJ^.' 5SS2£!!b_2JiARes. , S, 22 (1980), R^Zb. 

codinc T\T containing the DNA segment 

il^ntL I : polypeptide has been isolated aid 

l«ge !^ : ■ ""^ -"--^ o^'ai" 

o^en Z " °* ^•'^ "=°-^inant clone will 

Lr Le T ™* «<^<"'=es in the OKA segment coding 

for the desired polypeptide. These sometimes derive from ' 

fro""L' °' '"^^ "^''""^ sometimes result 

from additions to the insert made for the purpose of facil- 

can i"o the vehicle. These extra se^ences 
can be eliminated by judicious digestion with the proper 
endonucleases. The precise endonucleases will be readily 
ascertainable once the nucleotide sequence of the areas to 
be removed has been determined. 

in general, the inserted DNA will be trimmed to' yield 
the coding sequence with its terminator intact, and the 
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amino terminal coding sequence as devoid as possible of 
unnecessary DNA. The essential DNA will be inserted in 
reading phase with a constitutive or readily inducible 
regulon, including a ribosomal binding site, a promoter/ 
5 and an initiator^-codoai— Advantageously, a further coding 
sequence is included between the promoter and the initia- 
tor codon which codes for a signal peptide sequence which 
would be attached to the desired polypeptide and which 
would facilitate transport of the hybrid preprotein to the 

10 periplasmic space, where it will be cleaved to form the 
desired polypeptide. The clone may be further amplified 

t / chloramphenicol treatment, which can induce a 

relaxed response which favors plasmid replication over 
other polynucleotide synthetic processes. 

15 The desired polypeptide may be isolated by, e.g., 

subjecting the host to a cold osmotic shock procedure, which 
releases the content of the periplasmic space to the medium. 
Another alternative or complementary procedure involves the 
use of certain mutant strains as host, such strains having 

2 0 highly permeable outer membranes which permit periplasmic 
proteins to diffuse into the medium. These methods of 
expressing the recombinant DNA are well known to the art, 
and discussed in considerable detail in the aforementioned 
references. 

25 The method of the invention may be illustrated by a 

preferred application to the isolation and identification 
of recombinant clones containing DNA coding for the produc- 
tion of humsui histocompatibility antigens (MLAs) , particu- 
larly HLA-B antigens. The antigerls coded for by at least 

30 the HLA-A, HLA-B and HLA-C loci are similar in molecular 
weight and tend to co-purify. It has now been found that 
an • individual HLA mPNA constitutes about 0.01 to 0.05% of 
poly (A) mRNA in a cellular extract from a lymphoblastoid 
cell line , in rough agreement with data regarding the per- 

35 centage of total cellular protein represented by the cor- 
responding antigen. *^ . 
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The general procedure is to culture cells producing 
HLA antigens, e.g., granulocytes, lymphocytes, and the 
like, and to extract the RNA by a process such as that 
disclosed by Ghosh et al, J. Biol. Chem. . 253, 3643 (1978) 
5 Advantageously, the mRNA is enriched for poly (A) mENA by, 
e.g., chromatography on oligo(dT) cellulose, followed by 
elution of the poly (A) mllNA-enriched fraction. This frac- 
tion contains as many as 10,000-30,000 different mRNA 
species . 

One of the HLA antigens, HLA-A2, is known to have 
the tetrapeptide sequence Met-Trp-Arg-Arg. This sequence 
is advantageous because both methionine and tryptophan have 
unique codons. The code for arginine has a high degeneracy. 
Since it can be coded for by any of the codons CGN and AGPu 
(N represents any of the 4 nucleotides and Pu represents a 
purine nucleotide, either A or G) . However, it is known 
that the dinucleotide CG has a low occurrence in most, 
although not all, animal cell mRNA. Accordingly, AGPu is 
a more likely codon, and T is selected for the degenerate 
position in the complementary oligonucleotide primer. 
Since, as noted above, T can form a ^wobble base, pair with G. 

The undecanucleotide primer ' TAC ACC TCT TC^ ' is 
synthesized, e.g., by the triester method (So^et"al, 
Nucl. Acid Res _^, 4,. 2757 (1977); Hirose et al , Tet. Lett. . 
28' 2449 (1978)) or other known method of symthesizing 
oligonucleotides. The primer is then phosphorylated at 
the 5* end and pre-hybridized with the mJRNA mixture. 
Chain- terminated cDNA synthesis is effected using the 
primer/mRNA mixture in four separate runs, each run using 
a different ddNTP as a chain terminator and the three 
other dNPT's, as well as a reverse transcriptase. 

The CDNA from each separate run is separated, e.g., 
by electrophoresis, preferably on aery 1 amide-urea gels, 
and bands containing cDNA products which are four or more 
nucleotides longer than the primer are isolated and 
sequenced. It is known that the amino acid preceding 
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those used for deducing the primer sequence is valine, 
with a GUN codon in the rtiRNA. Thus, the cDNA products 
arising from primer hybridization to HLA mRNA and exten- 
sion therefrom will contain no more than 13 and 14 nucleo- 
5 tides, respectively, in the ddATP and ddCTP terminated 
runs. These need not be sequenced. Accordingly, the 
cDNA products from the ddGTP and ddTTP terminated re- 
actions are analyzed. 

A further amino acid sequence is now known for the 
10 HLA-B7 antigen, and has the sequence -gly-ala-val-val-ala- 
ala-val-raet up to the methionine. An mRNA sequence coding 
for this amino acid sequence is ^ ' GGN GCN GUN GUN GCN GCN 
GUN-^'. The corresponding complementary probe sequence is 
^'CCN CGN CAN CAN CGN CGN CAN (TAG ACC TCT TC) , with the 
15 primer sequence in parenthesis- It can be seen that a cDNA 
from the ddGTP run might have a ddG in the degenerate 15th 
position or in the 16th position. Similarly, in the ddTTP 
run, a probe might be available with a ddT in one of the 
degenerate positions, i.e., 15, 18, 21, 24, 27 and/or 30. 
20 Sequencing of these bands results in isolation of a cDNA 
product in the 30-nucleotide band of the ddTTP run and one 
of the cDNAs in the bands at the 16-nucleotide position in 
the ddGTP run containing sequences complementary to the 
inferred mRNA sequence corresponding to the known amino acid 
25 sequence of niA-BT. The nucleotide sequence of the 16- 
nucleotide probe is ^ ' CTTCTCCACATCACAG^ ' , and the nucleo- 
tide sequence of the 30-nucleotide probe is 
^ ' CTTCTCCACATCACAGCAGCGACCACAGCT . 

Although the nucleotide sequence of the primer is 
30 predicted to be complementary to the nucleotide sequence in 
HLA-A2 mRNA, the nucleotide sequence of the 30-nucleotide 
probe is complementary to a region coding for a known 
sequence in HLA-B7 antigen. The amino acid sequence of 
HLA-A2 antigen in this region is unknown, but it is likely 
35 to be identical to that of HLA-B7 antigen in this region. 
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since HLA-A2 and HLA-B7 antigens are highly homologous in 
their amino acid sequences (Orr et al, Proc. Natl. Acad. 
Sci. USA . 76 , 4395 (1979)). 

A poly (A) mRNA mixture is reverse transcribed 
5 using oligo (dT) primer to form cDNA. The second strand 
of the CDNA Ts synthesized with a DNA polymerase, followed 
by digestion with SI nuclease to remove the hairpin struc- 
ture resulting from self -priming . The plasmid pBR 3 22 is 
digested with Pst I endonuclease and homopoly(dG) tails 
10 introduced at its 3' ends by terminal transferase. The 

double-stranded cDNA derived from poly (A) mJlNA is likewise 
tailed with poly(dC). The two tailed DNAs are annealed and 
the mixture is used to transform the bacterial host E. coli 
HBlOl by the calcium chloride shock procedure. 
15 The resulting transf ormants may be individually 

examined, but it is convenient to pool them into groups 
of 100-150 (a total of 25 groups) for preliminary screening. 
For each group, DNA is prepared by equilibrium density 
gradient centrifugation. (See generally, Szybalski et al. , 
20 "Equilibritam Density Gradient Centrifugation", in Cantoni 
et al. , Eds., Proc. in Nucleic Acid Res. , 2, 311 (1971), 
and references cited therein.) The DNA from each pool is 
digested with the restriction endonuclease Hhal. The Pst 
site in pBR 322 into which the cDNAs are inserted is flanked 
25 by Hha l sites at a distance from the two ends of the in- 
sert of 22 and 315 nucleotides, respectively. This in- 
creases the size of c DNA inserts by 337 nucleotides and 
therefore increases the likelihood that shorter cDNA in- 
serts move higher than the pBR 322 fragment in agarose gels. 
30 The Hha digest of each pool is electrophoresed on 

1.5% agarose gel and Southern blots of these gels are 

hybridized to the 30-nucleotide probe which is labeled 

3 2 

at its 5' end with P -labeled phosphate. Only one DNA 
band from only one pool reveals hybridization with the 
35 30-nucleotide probe. The colonies from that pool are 
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Streaked individually and screened as above with the same 
labeled 3 0-nucleotide probe. Only one colony shows hybrid- 
ization. The plasmid DNA from this colony is prepared by 
the equilibrium density centrif ugation procedure. 
5 The recombinant plasmid is digested with Pst I endo- 

nuclease and the cDNA insert is purified on 4% acrylamide 
gels. The size of the cDNA insert is estimated by electro- 
phoresis alongside known size markers to be about 14 00 base 
pairs. The insert includes the nucleic acid sequence coding 
10 for the variable region of the antigen which is responsible 
for its specificity. The cDNA insert is digested with endo- 
nuclease Sau 961 which gives a largest fragment of about 700 
base pairs in length, along with several small fragments 
ranging in length from less than 100 base pairs to about 
15 175 base pairs. The 700 nucleotide fragment constitutes 
the 3' end of the cDNA clone, based on gene mapping data. 

This fragment is isolated, labeled at its 5' end with poly- 

32 

nucleotide kinase and P ATP, and redigested with Hinf I 
to give two fragments about 500 and 2 00 base pairs long. 

20 The partial nucleotide sequence of the 500 base pair 

fragment is determined by two different sequencing pro- 
cedures to be certain of the result; the Maat-Smith pro- 
cedure (Maat et al, Nucl. Acid. Res. , _5, 4537 (1978)) as 
well as the urhemical degradation procediire (Maxam et al , 

25 Proc. Natl. Acad. Sci. USA , 74 , 560 (1977)). 

Table I shows the correspondence between the nucleo- 
tide sequence' from this region of the cDNA clone and the 
amino acid sequence from position 223 to 269 of the HI*A-B7 
antigen (Orr et al. , Proc. Natl. Acad. Sci. USA , 76 , 4 3 95 

30 (1979)). Except for the amino acid at position 242 (glu) , 
the nucleotide sequence corresponds exactly with the amino 
acid sequence. Position 24 2 is coded for by GAG, the codon 
for glutamine (gin) , and the glutamate in the reported 
amino acid sequence is probably an artifact arising from 

35 adventitious hydrolysis. 
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TABLE I 



10 



IS 



!!J:Sr^^i^"2) giu-asp-gln-thr-gln-asp-thr-glu-leu-val- 
CDNA 5 ... G GAG GAG ACT GAG GAC ACT GAG CTT GTG 

riJi"^^^"f^?"S5?"^^^"^^y"^^P"^'^5-thr-phe-£lu-lys-trp-ala- 
GAG ACC AGA CCA GCA GGA GAT AGA ACC TCC CAG AAG TGG GCA 

ala-val-val-val-pro-ser-gly-glu-glu-gln-arg-tyr-thr-cvs- 
GCT GTG GTG GTG CCA TCT GGA GAA GAG CAG AGA TAG "a ?GC 

his-vol-gln-his-glu-gly-leu-pro-lys-pro- 
CAT GTA CAG CAT GAG GGG CTG CCG AAG CCA 



Because of the high degree of homology between the 
various HLA genes, the cDNA clone for HIA-B7 antigen will 
hybridize with all of the other genes for antigens in the 
HLA-A and HIA-B series and possibly the HLA-C series. Thus, 
it is possible to isolate and identify a large family of 
clones for a wide range of histocompatibility antigens. 
These clones, in turn, can be used to produce labeled HLA 
antigens by growth with -^"^C-labeled and/or ^^S-labeled 
methionine in the minicell system. Under these conditions, 
only plasmid coded proteins would be labeled. Using the 
20 long-lasting ^"^C-isotopes , a set of labeled reagents can be 
produced that can be used for rapid serotyping of indi- 
viduals for histocompatibility antigen types. The ^'Re- 
labeled antigens could be used with various antisera alone 
and in the presence of extracts of cells or cell membranes 
25 from the individual to be tested. Iramunoprecipitation of 
the radioactive jsrobe would be blocked only if the tested 
individual had a histocompatibility type sharing antigenic 
specificity with that particular probe. ^ 

Monoclonal antibodies "have been produced against 
various HLA antigens (Brodsky et al>- Inimuhol. Rev. , 47, 3 
(1979)), but this has in the past been a very complex and 
difficult task. The use of plasmids producing individual 
HLA proteins isolated and identified according to the 
present method greatly simplifies this process, and is 
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capable of generating purified antigens that are fully 
characterized at the nucleic acid level. These antigens 
can easily generate highly specific antibodies and can be 
used to produce specific monoclonal antibodies without the 
difficulties encojontersd using earlier techniques. 

The general approach for cloning cDNAs from minor 
mRNA components of an mRNA extract can be applied to isolate 
and identify a variety of other genes of biological and 
medical interest- These include any gene coding for a 
polypeptide for which short amino acid sequences are known. 
Methods are available to sequence very small amounts of 
proteins, which means that a DNA segment can be cloned even 
though both the protein and its corresponding mRNA are 
present in very low amounts and are extremely difficult to 
15 isolate and characterize fully. 

Amino acid sequences are already available for certain 
proteins which have not yet been cloned, and the present 
method could therefore be easily applied to the production 
of clones for, e.g., coagulation factors such as fibrinogen, 
prothrombin and proteins in the earlier stages of blood 
clotting which are present in lesser abundance, proteins of 
the complement system, and peptide hormones that have not 
yet been susceptible to existing cloning procedures. While 
a variety of procedures have been used to clone many of the 
genes for peptide hormones from pituitary parathyroid, in- 
testinal mucosa and central sources, genes for other hor- 
mones have not been cloned successfully. These include, 
for example, the parathyroid hormone calcitonin and the 
various hypothalamic peptides that control pituitary activ- 
ity, such as thyrotrophin releasing factor, corticotrophin 
releasing factors, peptides suppressing the excretion of 
growth hormone, and the like {Snyder, Science , 209, 976 (1980)), 
vitamin B-12 binding protein, enzymes that are deficient in 
various disorders such as Tay-Sachs disease, or enzymes of 
glycogen metabolism, antigens associated with tumor cells 
and various types of lymphocytes. !^ 



20 



25 
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A further advantage of the present procedure is 
that one need not even use entirely pure proteins. As 
long as one protein is the major component in a mixture, 
the amino acid sequence of a single degradation product 
of the protein, e-^. / a cyanogen bromide or a large tryptic 
digest fragment, can give sufficient information to design 
a primer sequence and produce and identify appropriate 
probes . 

It will be understood that the clones produced by 
the present method can be modified according to techniques 
well known in the art to optimize the recombinant plasmid 
for expression of the desired polypeptide. The particular 
types of optimization will generally be apparent to the 
skilled art worker. 
15 Without further elaboration, it is believed that one 

skilled in the art can, using the preceding description, 
utilize the present invention to its fullest extent. The 
following preferred specific embodiments are, therefore, to 
be construed as merely illustrative, and not limitative of 
20 the remainder of the disclosure in any way whatsoever. In 
the following examples, all temperatures are set forth un- 
corrected in degrees Celsius; unless otherwise indicated, 
all parts and percentages are by weight. 

Chemicals for oligonucleotide synthesis are as dis- 
25 closed in Itakxira et al., J. Am. Chem. Soc. , 97 , 7327 (1975) 
Sources of materials include: dNTPs, ddNTPs and oligo (dT) 
cellulose, from Collaborative Research^- Waltham, MA.; re- 
verse transcriptase from avian myeloblastosis virus, from 
Life Sciences, Inc. , St. Petersburg, FL. ; ^~^^P-ATP, of 
30 specific activity 3000 ci/mM, from New England Nuclear, 

Boston MA.; RPMI-1640 media and S. coli tilNA, from Gibco, 
Grand Island, N.Y. ; restriction endonucleases, from New 
England BiolaOas, Beverly, MA. and/or Bethesda Research Labs, 
Bethesda, MD. ; snake venom phosphodiesterase, from Worthing- 
35 ton Biochemical Corp., Freehold, N.J.; calf thymus DNA, from 
Sigma, St. Louis, MO.; DNase, from Boehringer-Mannheim, 
Indianapolis, IN. 

CM?I 
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Polynucleotide kinase is prepared according to 
Richardson, Proc. Natl. Acad. Sci. USA , 5±, 158 (1965). 

Human lymphoblastoid cell line RPMI 4265 {HLA-A2; 
A2, B7, B12) is used as a source of mRNA in these examples. 
5 However, any cerL" lint which is homozygous B7 can be used, 
e.g., GM3161 RI640 201 0300, a specifically characterized 
h^an lymphocyte culture, available commercially from the 
Human Genetic Mutant Cell Repository, Institute for Medical 
Research, Camden, N.J. Construction of lymphoblastoid cell 
10 lines is described generally by Moore, "Cell Lines From 

Humans With Hematopoetic Malignacies" , in Fogh, Ed., Human 
T.,^nr cells In Vitro , PP. 229-332 (Plenum Press, 1975). 
EXAMPLE I 

Preparation of p oly (A) mRNA 

15 RPMI 4265 cells are grown in RPMI 1640 media contain- 

ing fetal calf serum, glutamate, pyruvate, penicillin and 
streptomycin to a density of 4x10^ cells/ml. RNA is ex- 
tracted as described by Ghosh et al . , J. Biol. Chem. , 253, 
3643 (1978), except that phenol : chloroform : isoamyl alcohol 

20 (100:100:1) is used in place of phenol extraction. Enrich- 
ment for poly (A) mRNA is achieved by oligo(dT) cellulose 
column chromatography. 
EXAMPLE II 

s ynthesis and .haracteriza ^^ of labeled oligonucleotide 

25 pr imer . , - , ^ . j 

The primer ^ ' taCACCTCTTC^ is chemically snythesized 

by the triester method. The general conditions for de- 
blocking and coupling reactions are described first, and 
the specific steps are then outlined. The following abbre- 

30 viations are used: 

DMT : Dimethoxytrityl 

Benzoyl 



35 



Bz 
P 

Ar 



Phosphotriester group ^Pf o^®^'^®*^]. . 
(P-O- indicates deblocked function) 

p-Chlorophenyl 
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CE 
TEA 
BSA 

TPST 

C 

A 

T 



10 



15 



20 



25 



30 



35 



S-Cyanoethyl 
Trie thy 1 amine 
Benzenesulfbnic acid 

Triisopropylbenzenesulfonyltetrazole 
Deoxyribosylcytosine 

Deoxyribosyladenine 

Deoxyribosyl thymine 

group . ■ 

TO a solution of fully protected oligonucleotide 
(0 Im mole) in anhydrous pyridine (10 ml) is added TEA (2„ 
moles). The reaction is complete in 2-3 hours as judged by 
silxca gel thin layer chromatography. The solution is then 
evaporated to a foam in order to remove excess TEA and 
acrylonitrile liberated during the deblocking reaction. The 
foamy material is used as such in the coupling reaction. 

Deblocicinq Of an oliconurl eotide wl>.h to remov, 

group . ■ i=- 

The solution of an oligonucleotide (0.1 mmole in 10 
ml Of Chloroform.- methanol (7:3) is cooled in ice water 
Then 100 mg of BSA is added to it, with stirring. The re- 
action is monitored by silica gel thin layer chromatography, 
upon completion, after neutralisation with a solution of 5» 
sodxum bicarbonate, the reaction mixture is extracted with 
chloroform. The solvents are evaporated under vacuum. 
Couplxncf reaction of ol wnnucleotid.. t h TPST to 1.n..K^„ 
the chain . * 

TO 0.1 mmole of TEA- deblocked oligonucleotide is 
added 0.12 nunple.of BSA- deblocked oligonucleotide. The 
mxxture is dissolved in 10 ml of dry pyridine, the solution 
xs evaporated to dryness, and the syrupy residue is redis- 
solved in 5 ml of dry pyridine, followed by addition of 0 3 
mmole of TPST. After stirring at room temperature for 1 to 
5 hours, the reaction mixture is decomposed with distilled 
water (0.2 ml). The resultant solution is evaporated to a 
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gum in vacuo. The gum is dissolved in ice-cold chloroform 
(20 ml) followed by neutralization of acid wxth a 5% solu- 
tion of sodium bicarbonate in an ice-water bath. The chloro- 
form layer is separated and the- aqueous layer is washed 
twice with chloroform LIS ml each) , dried over anhydrous 
sodium sulfate ahd evaporated to dryness. The crude mix- 
ture is purified by silica gel chromatography. 

The reaction sec .ence is shown schematically as 

follows : 
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BZ 



DMT C P CE 

I 

Ar 



- 24 - 



TE 



BZ 
I 

DMT C P— O® 

1 

Ar 



DMT T — P CE 

I 



HO T P CE 

I 

Ar 



2 + 4 



6 + 4 



1 + 1 TPST ^ 



BZ 



TPST>^ DMT C P T P CE 

^ I I 

Ar Ar 
5 



TEA 



TEA 



BSA 



BZ 

1 



■ BZ 
I 

DMT C P T P O® 

I I 
Ar Ar 
6 

BZ 



TPST^ DMT- 



-P T P T P CE 

I I ! 

Ar Ar Ar 

7 



BZ 
I 



DMT- 



-P T P T P O® 

I I I 

Ar Ar Ar 



8 



BZ 

I 

H O C P T P CE 

I I 
Ar Ar 

9 

BZ 

I 



DMT— C — P T — P — T — P — C — P T — P— CE 

I I I I 1 

Ar. Ar Ar Ar Ar 



10 



OM?I 
v'^-K V/IPO 
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BSA 



1 



TPST 



BZ 



20 
J 

BZ 



TEA 



BZ 



10 



BZ BZ BZ BZ BZ 

I I 1 I I 1 I 

DMT-C-P-T-P-T-P-C-P-T-P-C-P-C-P-A-P-C-P-A-P-CE 

I I I I I 1 I I I 1 
Ar AT Ar Ar Ar Ar Ar . Ar Ar Ar 

21 



21. TEA jy 22 



HQ-T-OBZ \ 
TPST ^ 



23 BSA V NH4OH 



5' 



HO 



-C-P-T-P-T-P-C-P-T-P-C-P-C-P-A-P-C-P-A-P-T-OH 3 • 



ie ie qQ ie 



1 I I 1^ « 
)e o® o® o® o® 0® 
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32 .^"""^ ^'"^^'^ " "^^^ Phosphorylated at the 5- end with 
o/looo ^^/Vf ^^"-^-"^e .i„a.e to a specific activity 
al 240 ?2"- * """"" Of O.S Of p.i^er 

5 100 ul of Tris pH 3.5 with 10 units of T4 poly- 

nucleotide Icinase at 37-c for 40 minutes. The labeled 
primer is purified on 15» polyacrylamide gel, eluted with 
0.1 M salxne sodium citrate (SSC) (3 ml). The eluent 
10 11 ^-"Ited by passage over a colu:^ 

10 of Sephadex G-10 followed by lyophilization. 

The primer is characterized by the two dimensional 
electrophoresis homochromatography technique of Jay et al 

1, 331 ,1„S,. The partial snaice venol 

IS ITI"^ Ml) contains 20 

Tris H pH 8.5, 10 mM MgCl,, labeled primer ,100,000 cpm, , 
1 ug Of calf thymus DNA, 1 m of 1 ug/mi DNase, and 3 ul 
of 0.2 mg/ml snaJce venom phosphodiesterase at 37<>c 
Aliguots Of 2-3 pl are withdrawn after every three 'minutes 
.nto 3 ul Of 100 mM EDTA. The total reaction mixture 
20 xs then evaporated to dryness, dissolved in 5 ul of water 

spotted on a cellulose acetate strip (2x50 cm^) and elect«- 
Phoresed at 3000 volts for 40 min. After transfer to 
DEAE cellulose plates, the second dimension is run in 
25% homo-B (homo B: 8M urea in 1:3 ratio) until the orange 
25 dye is 10 cm from the top. (Homo B is a mixture of 7M 
urea and a partial hydrolysate of E. coli tRNA with KOH- 
Brownlee et al. , Eur. J. Bio^h^m n, 395 (iggg, 
EXAMPLE 3 

3, ^re-hybridK. Nation of pri..r/PN. .i..... ... ^ ^.^^ svn..... . 

Fxrst:, 0.25 ug (75 p moles) of ^-p-labelcd pri:ner 

xs mixed with 80 ug of the poly (A) -RNA of Example 1 in 
60 Ul of 80 mM KCl. The mixture is then heated to 90-0 
for 10 min, followed by addition of 5 ul of IM Tris 
pH 8.3. This mixture is incubated for 2 hr at 41«»C The 
5 CDNA synthesis is performed using the above prioer/RNA 
mixture. Each of the four dideoxynucleoside triphosphate 
terminated cDNA synthesis reactions (100 ul) contains ^ 
the above primer/RNA mixture (60 ul) . 5mM MgCls, lOmM 
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dithiothreitoi (DTT) , 500 uM each of the other three de- 
oxynucleoside triphosphates and 5 ul of (30 units/uD 
reverse transcriptase. In each case, the reaction is 
incubated at 41'C for 3 hr; 125 ul of water is added, 
5 followed by extraction with 100 \il of phenol: chloroform 
(1:1). The precipitate is dissolved in 25 ul of 0 . LM 
NaOH and incubated at 41''C for 1 hr. An equal volume 
of 8M urea, 0.05% xy lenecyanol/0 . 0 5% broraophenol blue 
is added. The mixture is heated at 90*C for 1 minute 
10 and layered on 12.5% acrylamide-7M urea gel (40x20 era). 
The electrophoresis is performed in 50 mM Tris borate 
pH 8.3, ImM EDTA. 
EXAMPLE 4 

Characterization of cDNA products obtained from dideoxy - 

15 nucleoside triphosphate tezminated cDNA synthesis 

An individual cDNA band is cut out from the gel, 
homogenized and soaked overnight with 3.5 ml of 0.1 X 
SSC. After centrifugation, 20 ug of E. coli tRNA is added 
to 3 ml of the supernatant. The supernatant is made 0 , 3M 

20 in sodium chloride and precipitated with 3 volumes of 

ethanol. The precipitate is dissolved in 50 ul of water, 
centrifuged to remove residual acrylamide and then evapor- 
ated to dryness. The partial snake venom phosphodiesterase 
digestion mixture (15 yl) contains 20 mM Tris h"*" pH 8.5, 

25 10 mM MgCl 2",. 'labeled cDNA, 1 ug of calf thymus DNA, 1 
ul of 1 ug/ml DNase, and 3 ul of 0.2 mg/ml snake venom 
phosphodiesterase at. 37^0. Aliquots of 2-3 ul are with- 
drawn after every three minutes into 3 ul of 100 inM EDTA. 
The total reaction mixture is then evaporated to dryness, 

30 dissolved in 5 ul of water, spotted on a cellulose acetate 
strip (2x50 cm^) and electrophoresed at 3000 volts for 
45 min to 1 hr. After transfer to DEAE cellulose plates, 
the second dimension is run in 75% homo-B (Homo B: 8M 
Urea in 3:1 ratio) oontil the orange dye is 2 to 5 cm from 

35 the top. The homochromotography in 75% homo-B is run 

for 16 hours at 63*0 in order to achieve better resolution 
of the longer oligonucleotides of the partial digest. ^ 
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Sequencing is performed on any bands for IS or 
more nucleotides in the.ddGTP run and bands for IS, 18, 
21. 24, 27 and 30 nucleotides in the ddTTP run. From 
the nature of the dideo^cy G reaction, a G residue is not 
5 »P-ted between the 3- end of the primer sequence and 
the isth residue o£ the cDNA product. This is useful 
in drawing -branches", each representing a two-dimensional 



mar, «^ ^1- • , . ^ J-meHSiOnaX 

bZd T^T T ^"^-^^ position 
run 

which corresponds to the mRNA sequence 



c- w^. wiuiixii T:xie j.btn posi' 

In fact, a 3 0-nucleotide oligomer from the ddTTP 



20 



25 



30 



35 



- ..«wx«wt.iae Oligomer from the ddTTP 
run has the sequence ^ CTTCTCCACATCACAGCAGCGACCACAGCT^ ' 
which corresponds to the mRNA sequence 

AGCUGUGGUCGCUGCUGUGAUGUGGAGAAG^ ' , coding for NH -Ala- 
Val-Val-Ala-Ala-val-Met.Trp-Arg-Arg. A 16-nucleotide 
oligomer from the ddGTP run has the sequence 

CTTCTCCACATCACAG^ • , the first half of the 3 0-nucleotide 
sequence. The 30-nucleotide long probe (20,000 cpm) was 
used to identify the HLA cDNA clone. 
EXAMPLE 5 

Construction and sn r eenina of cDNA libr;.^-^. 

The CDNA library is constructed from the cell line 
RPMI 4265. The poly (A) mRNA of Example 1 is reverse 
transcribed using oligo(dT) as primer according to the 
procedure of Buell et al., j. Biol. Ch.n,., 253, 2483 (1978) 
The second strand of the cDNA is synthesized with the 
proteolytic fragment of DNA polymerase I, followed by 
digestion with SI nuclease to remove the hairpin structure 
resulting from self -priming, according to the procedure 
of Buell et al,, J. Biol. Ch^n., . _253 , 2483 (1978). The 
resultant double- stranded cDNA is tailed with poly(dC) 
at Its 3- ends according to the procedure of Nelson et 
' Methods Enzvmol . . 68 , 41 (1979). 

The plasmid pBR322 is digested with Pst I endonuclease 
and tailed with homopoly(dG) at its 3- ends according 
to the procedure of Nelson et al. , ibid. The tailed plasmid 
DNA and the tailed cDNA are annealed, and the resultant 
recombinant plasmids are used to transform the E. coli 
K-12 mutant strain HBlOl by the calcium chloride shock. 
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procedure of Katcoff et al. , Proc. Natl. Acad. Sci. USA , 
77, 960 (1980). 

The resultant trans formants are pooled into groups 
of 100-150 (total 25 groups) and inoculated at a density 
5 of approximately 1x10^ bacteria/ml in complex liquid 
medium (i.e. L-:— broth) and grown with shaking at 37*»C 
until reaching a density of 2-6x10^ bacteria/ml. 
Chloramphenicol is added to a final concentration of 20 
Vig/ml (this step is optional) and the bacteria are incubated 

10 an additional 12-16 hr at 37''C. The bacteria are concen- 
trated by centrifugation and then suspended in 5 ml 0.05M 
Tris-HCl pH 8.0/25% sucrose (N.B. all values are based 
on an inoculated starting culture of 50-1000 ml) . While 
on ice, 0.5 ml of a solution of 20 mg lysozyme/ml is added 

15 and briefly swirled. Precooled 0 . 5M^ EDTA pH 8.0 is added, 
(0.5 ml) briefly swirled and incubated for 5 min on ice. 
Lysis is completed by the addition of 3 ml of 3x Triton 
mix (3 ml 10% Triton XlOO; 7 5 ml of 0.25M EDTA; 15 ml 
IM Tris pH 8.0; 7 ml H^O) and incubated on ice for 10 

20 min. The later bacteria are centrifuged at 20,000 rpm 

for 30 min in a SS-3 4 rotor. The supernatant is decanted 
and 0.9g of CsCl and 0.1 ml of ethidium bromide (EtBr) 
(10 mg EtBr/ml) are added per ml of supernatant. The 
equilibrium gradient is generated by centrifuging at 

25 approximately 200,000 G for 36-48 hr ajid the plasmid 
band is withdrawn with a syringe. The EtBr is removed 
by extracating with isopropanol equilibrated with 0 . 5M Tris 
pH 8.0; O.OIM EDTA pH 8.0; 0.05M NaCl that has been 
saturated with CsCl. The CsCl is removed by dialysis 

30 and 5mM Tris pH 8.0; ImM EDTA. 

To insure better separation of the cDNA insert, 
2 yg of DNA from each pool is incubated in 20 ul of 50 
raM NaCl, 6mM tris pH 7.9, 6mM MgCl2/ 5 mM mercapto- 
ethanol and 2 units of Hhal at 37'»C for 1 hr. The digests 

35 are electrophoresed on 1.5% agarose gel. The gel is 
blotted to a nitrocellulose filter as described by 
Southern, J. Mol. Biol., 98 , 503 (1975). The filter is ^ 
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washed with 2 x SSC, air dried, and heated at 68-80*'C 
for 5 hr under vacuum. The dried filter is presoaked 
in 3 X SSC, 1 X Denhardt's (0.0 2% polyvinyl pyrrolidone, 
0.02% Ficoll, 0.02% BSA) at 50°C for 1 hr. It is then 
5 incubated with the "^^P-lafaeled 30-nucleotide probe (20,000 
cpra) of Example 4 , ,^in 25ml of 3 x SSC, 1 x Denhardt's, 
0.1% SDS (sodium dodecyl sulfate) at 50°C for 36 hours 
followed by washing once with 3 x SSC, 1 x Denhardt's 
and twice with 2 x SSC at SO'C. After air drying, it 
10 is subjected to autoradiography at -70 "C with an 
intensifying screen. 

The colonies from the one pool showing hybridization 
are streaked and hybridized with the same 30-nucleotide 
probe (Grunstein et al. , Proc. Natl. Acad. Sci. USA , 72 , 
15 3961 (1975) , Only one colony from that pool hybridizes 
to the probe. Each pool of DNA" before Southern blotting 
is complex, showing the extremely sensitive separation 
achieved by the present method. 

The isolated plasraid DNA from the recombinant clone, 
20 obtained by insertion of cDNA coding for HLA-B7 antigen 
protein and including a nucleic acid sequence coding for 
the variable region of riLA-B7 , is denoted pDPOOl. A deposit 
of this plasmid, Escherichia coli pDPOOl DNA , has been 
made in the American Type Culture Collection, Rockville, 
25 MD, and has been assigned ATCC No. 40032. The clone itself, 
Escherichia col i pDPOOl , has also been deposited, and 
is assigned ATCC No. 31748. 
EXAMPLE 6 

Nucleotide seguence of the cDNA clone 

^0 The plasmid DNA from recombinant clone is digested 

with restriction endonuclease Pst I. The cDNA insert 
is approximately 14 0 0 base pairs long, as determined by 
electrophoresis alongside DNAs of known base pair length. 
The insert is isolated by electrophoresis on 4% 

35 acrylamide gel. The isolated DNA (20 yg) is incubated 
in 60 raM NaCl, 6 mM Tris h"*" ph 7.4, 15mM MgCl2, ^ ^ 
mercaptoethanol and 10 units of Sau 96 I at 37'C for 2 
hr followed by electrophoresis on 4% acrylamide gel. The 
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largest fragment, about 700 base pairs long, is isolated, 
labeled at its 5* end with polynucleotide kinase and 
^2p_ATP, and redigested with Hinf I - The mixture is 
electrophoresed on 5% acrylamide gel. Two fragments of 
5 approximately 500 base pairs long and 200 base pairs long 
labeled at one._end are isolated. 

The partial nucleotide sequence of the 500 base 
pair fragment starting from its 5' end, determined by 
the procedure of Maat et al. , Nucl. Acid Res. , 5, 4537 
10 (1978), as well as a second partial sequencing, effected 

by the chemical degration procedure of Maxam et al., Proc- 
Natl. Acad. Sci. USA , 74, 560 (1977), gives the partial 

sequence of Table I. 

The preceding examples can be repeated with similar 
15 success by substituting other polypeptides for which a 
partial amino acid sequence is known or can readily be 
determined, and by using other cloning vehicles, hosts, 
chain-termination procedures, sequencing procedures, 
restriction endonuc leases , oligonucleotide primers, cell 
20 lines and/or other generically or specifically described 
materials and/or operating conditions of this invention 
for those used in the preceding examples. 

From the foregoing description, one skilled in 
the art can easily ascertain the essential characteristics 
25 of this invention and, without" departing from the spirit 
and scope thereof, can make various changes and modifica- 
tions of tbie invention to adapt it to various usages 
and conditions. 
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We claim: 

1- A method for isoi^*-,- 

for at least one desired h T 
5 least a short a.xl a"' se'^m"^ " 
-^od co™prisi„, the steps o" 

effecting cDNA synthesis on . 
contain^., a target ccLX saTd arf"" °" "^"^ 

polypeptide, and isolating ^^^^"^ 
10 ,b, "olatxng the resultant cDNA mixture- 

clonin, vehicles ::: t^alj""'^" ^^'^^ reco^inant 
and transforming hosts with said vehicles; 

"en':L.::ra"rt:jr:anT:iirc"\^ - 

15 which is homologous over at le! ! """"^"^ a DNA segment 

" least one oligonucle" rd\ "r^\^~ T^V^ 
segment; wherein said probe is =P*=^f^= for saxd DNA 

nucleotide sequence of Z !l ^^^-""^n of the 

a nucleotide 'e^uence'co^lll"::;^^^^ ^^""^ ''^^^^ 
^0 target n^A coding for a portion o^ 3",: 

sequence , and is complementary to a lonl "° 
target mRNA coding for a ,0 ^ °* ^^^^ 

-mo acid sequeZe °' 

2S nucleotide Irobr'''"^ °' oli.o- 

s^nthesL^rt: r^rinr:: rriiT-^^^— - — 

and only three deoxynucleotLe trinho r^'"""" ^^"^^ 

separating extended cDKA f ra« ''^"^ ' 
three more nucleotides than said 
30 extended fractions and ^«<^encing said 

tide complemen'a^'t: a " oligonucleo- 

3. The me^ol of cT!^ T'T 
has from 10 to 1. nucleotides ' 

4- The method of claim -y ,.u 
35 effected cO.A synthesis is eflcted ^^^""^^^ 
Of one dideoxynucleoside triphosphate ha ' ^'""^""=^ 

base from said three deoxynucleoside trloh"' t 
Wherein said extended cO.A fractlL^^ Z^T:^ ^ 
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nucleotides more than said primer. 

5. The method of claim 4, wherein said primer 
has from 10 to 14 nucleotides. 

6. The method of claim 1, wherein said target 

5 mRNA represents less than 2 mole percent of the total mRNA 
in the mixtureV"" 

7. The method of claim 1, wherein said target 
mRNA represents up to about 1 mole percent of the total 
mRNA in the mixture . 

10 3. The method of claim 4, wherein said target 

mRNA represents up to about 0.1 mole percent of the total 

mRNA in the mixture. 

9. The method of claim 1, wherein said polypeptide 

is a humam major histocompatibility antigen. 
IS 10. The method of claim 9, wherein said antigen 

is coded for on the HLA-B locus. 

11. The method of claim 10, wherein said oligo- 

-3 2 

nucleotide primer is (5* P ) dCTTCTCCACATQ^ . 

12. The method of claim 4, wherein said polypeptide 
20 is a human major histocompatibility antigen coded for 

on the HUV-B locus; wherein said oligonucleotide primer. 

is (5 '""^^P) dCTTCTCCACAT^„; and wherein said probe is 
-32 

( 5 ' P ) dCTTCTCCACATCACAGCAGCGACCACAGCTQjj . 

13. In a recombinant cloning vehicle comprising 
25 an inserted DNA segment coding for at least one desired 

heterologous polypeptide, at least a short amino acid 
sequence of which is known, 

the improvement wherein said cloning vehicle is 
isolated from a recombinant clone which is isolated and 
30 identified by a process comprising the steps of: 

(a) effecting cDNA synthesis on a mixture of mRNAs 
containing a target mRNA coding for said at least one 
polypeptide, and isolating the resultant cDNA mixture; 

(b) inserting said resultant cDNA into recombinant 
35 cloning vehicles, and transforming hosts with said vehicles; 

and 

(c) separating the transf ormants and isolating *^ 
and identifying a recombinant clone containing a DNA 
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segment which is homologous over at least a portion thereof 
to at least one oligonucleotide probe specific for said 
DNA segment; wherein said probe is an extension of the 
nucleotide sequence of an oligonucleotide primer having 
5 a nucleotide sequence complementary to a region of said 

target mRNA coding for a portion of said known - amino acid 
sequence, ^d is -complementary to a longer region of said 
target mRNA coding for a longer portion of said known 
amino acid sequence. 

^° "^^^ cloning vehicle of claim 13, wherein said 

at least one polypeptide is a human major histocompatibil- 
ity antigen. 

15. The cloning vehicle of claim 14, wherein said 
antigen is coded for on the HLA-B locus. 

'^^^ cloning vehicle of claim 13, which is a 
bacterial or viral plasmid. 

17, The cloning vehicle of claim 15, which is the 
recombinant plasmid obtained by inserting said DNA segment, 
containing the partial nucleotide sequence (5')dG GAC 
GAG ACT GAG GAC ACT GAG CTT GTG GAG ACC AGA CCA GCA GGA 
GAT AGA ACC TTC CAG AAG TGG GCA GCT GTG GTG GTG CCA TCT 
GGA GAA GAG CAG AGA TAC ACA TGC CAT GTA CAG CAT GAG GGG 
CTG CCG AAG CCA, into the plasmid pBR 322. 

18. In a recombinant cloning vehicle comprising 
an inserted DNA segment _ coding for at least one" desired 
heterologous polypeptide, 

the improvement wherein said polypeptide is a human 
major histocompatibility antigen. 

19. The cloning vehicle of claim 18, wherein said 
antigen is coded for on the HUV-B locus, 

20. The cloning vehicle of claim 19, which is the 
recombinant plasmid obtained by inserting said DNA segment, 
containing the partial nucleotide sequence {S')dG GAC 
CAG ACT CAG GAC ACT GAG CTT GTG GAG ACC AGA CCA GCA GGA 
GAT AGA ACC TTC CAG AAG TGG GCA GCT GTG GTG GTG CCA TCT 
GGA GAA GAG CAG AGA TAC ACA TGC CAT GTA CAG CAT GAG GGG 
CTG CCG AAG CCA, into the plasmid pBR 322. 
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21. A recombinant plasmid derived from the plasmid 
pBR 3 22 and containing an inserted DNA segment coding 

for a human major histocompatibility antigen coded for 
on the HLA-B locus, said segment being inserted at the 
5 Pst I cleavage site of pBR 3 22; wherein said inserted 
DNA segment contains 'the partial nucleotide sequence 
( 5 ' ) dG GAC CAG ACT CAG GAG ACT GAG CTT GTG GAG ACC AGA 
CCA GCA GGA GAT AGA ACC TTC CAG AAG TGG GCA GCT GTG GTG 
GTG CCA TCT GGA GAA GAG CAG AGA TAC • ACA TGC CAT GTA CAG 
10 CAT GAG GGG CTG CCG AAG CCA. 

22. In a recombinant clone derived from a host 
transformed with a recombinant cloning vehicle comprising 
an inserted DNA segment coding for at least one hetero- 
logous polypeptide, at least a short amino acid sequence 

IS of which is known, 

the improvement wherein said cloning vehicle is 
isolated from a recombinant clone which is isolated and 
identified by a process comprising the steps of: 

(a) effecting cDNA synthesis on a mixture of mflNAs 
20 containing a target raRNA coding for said at least one 

polypeptide, and isolating the resultant cDNA mixture; 

(b) inserting said resultant. cDNA into recombinant 
cloning vehicles, and transforming hosts with said vehicles; 
and 

25 (c) separating the trajisf ormants and isolating 

and identifying a recombinant clone containing a DNA segment 
which is homologous over at least a portion thereof to 
at least one oligonucleotide probe specific for said DNA 
segment; wherein said probe is an extension of the nucloo- 

30 tide sequence of an oligonucleotide primer having a 
nucleotide sequence complementary to a region of said 
target mRNA coding for a portion of said known amino acid 
sequence, and is complementary to a longer region of said 
target mRNA coding for a longer portion of said known 

35 amino acid sequence. 

23. The recombinant clone of claim 22, wherein 
said at least one polypeptide is a human major histocom—" 
patibility antigen. 
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24. The recombinant clone of claim 23, wherein 
said antigen is coded for on the HLA-B locus. 

25. The recombinant clone of claim 24, which is 
the E. coli strain HBlOl transformed by a recombinant 

5 plasmid derived from the plasmid pBR 3 22 and containing 
an inserted DNA segment coding for a human major histo- 
compatibility antigen coded for on the HLA-B locus said 
segment being inserted at the Pst I cleavage site of 
PBR 322; wherein said inserted DNA segment contains the 
partial nucleotide sequence ( 5 • ) dG GAC CAG ACT GAG GAG 
ACT GAG CTT GTG GAG ACC AGA CCA GCA GGA GAT AGA ACC TTC 
CAG AAG TGG GCA GCT GTG GTG GTG CCA TCT GGA GAA GAG CAG 
AGA TAC ACA TGC CAT GTA CAG CAT GAG GGG CTG CCG AAG CCA. 

26. The recombinant plasmid Escherichi cm.- 
pDPOOl DNA, as deposited with The American Type Culture 
Collection, Rockville, MD, and having ATCC accession No. 

27. The recombinant clone Escherichi col.- pDPOOl 
as deposited with The American Type Culture Collection, 

20 Rockville, MD, and having ATCC accession No. 31748. 
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